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Abstract 

An edge-operation on a graph G is defined to be either the deletion of an existing 
edge or the addition of a nonexisting edge. Given a family of graphs Q, the editing 
distance from G to ^ is the smallest number of edge-operations needed to modify G 
into a graph from Q. In this paper, we fix a graph H and consider Forb(n, H), the set 
of all graphs on n vertices that have no induced copy of H. We provide bounds for 
the maximum over all n- vertex graphs G of the editing distance from G to Forb(n, H), 
using an invariant we call the binary chromatic number of the graph H. We give 
asymptotically tight bounds for that distance when H is self-complementary and exact 
results for several small graphs H. 

1 Introduction 

The investigation of graphs not containing subgraphs with given properties is a classical 
problem. For example, determining the maximum number of edges in a graph with no copy 
of a fixed subgraph H has been studied intensively for the last 70 years |T3l 120] • Very 
often, though, the desired task is not to determine the extremal graph without a given fixed 
subgraph, but rather to start with an arbitrary graph and modify it in a small number of 
steps such that the resulting graph does not contain a forbidden subgraph. 

The problem of modifying the given graph such that the resulting graph satisfies some 
global properties has been addressed by Erdos et al. jTHllHllII]. They investigated the number 
of edge deletions sufficient to transform an arbitrary triangle- free graph into a bipartite graph, 
as well as the smallest number of edge additions sufficient to decrease diameter. 

In this paper, we investigate the problem of transforming a given graph into a new graph 
having the local property of avoiding a fixed induced subgraph. Starting with an arbitrary 
graph G, we would like to calculate the minimum number of edges needed to be added to 
or deleted from G to obtain a graph not containing a fixed induced subgraph. Formally, an 

*Department of Mathematics, Iowa State University, Ames, lA 50011, axeiiovic@matli.iastate.edu 
^Department of Mathematics, University of Louisville, Louisville, KY 40292, kezdy@louisville.edu 
•'■Department of Mathematics, Iowa State University, Ames, lA 50011, rymartin@iastate.edu 
§ Corresponding author. 



1 



edge-operation is defined to be either the deletion of an existing edge or the addition of 
a nonexisting edge. Let Dist(G, if) denote the minimum number of edge-operations needed 
to transform the graph G into a graph isomorphic to H . In other words, it can be described 
as a symmetric difference as follows: 



Clearly this parameter is defined if and only if G and H have the same number of vertices. 
If 7i is a class of graphs on n vertices, we define Dist(G, Ti) = min{Dist(G, H) : H e Ti} for 
a graph G on n vertices. Finally, Dist(?T,, 7i) = max{Dist(G, H) : = n}. We call the 

metric Dist(G, if) the editing distance since the operations performed can be considered 
as editing the edge set of a graph. Our interest here is the class 7i of graphs on n vertices 
containing no copies of a given fixed graph H as an induced subgraph. We denote this class 
by Forb(n, H) (or simply by Forb(if) when it is clear from the context). Similarly Forb'(if) 
is the family of all graphs on n vertices with no subgraph isomorphic to H. 

This graph editing problem has numerous applications in computer science and bioin- 
formatics. For example, consider a metabolic network and identify genes with vertices of a 
graph and pairs of interacting genes with edges of the graph. It is a fundamental question 
in biology (from evolutionary and practical points of view) to find how many edge-changes 
in such a graph must be performed to avoid an induced subgraph corresponding to a certain 
metabolic process. Another example involves consensus trees. It is known that two con- 
sensus trees are comparable if there is no induced path on five vertices in a corresponding 
bipartite graph [^HEllZj- In particular, finding the smallest number of edge-changes in such 
a graph will determine the distance between these trees. 

On the other hand, the editing problem of graphs corresponds to determining the dis- 
tance between {0, l}-matrices. If A and B are the adjacency matrices of graphs Gi and G2 
respectively, then Dist(G'i, 6*2) corresponds to the number of positions where A and B dif- 
fer, i.e., to the Hamming distance between A and B. Thus finding editing distance between 
classes of graphs provides the Hamming distance between classes of symmetric matrices with 
the same diagonal entries. Moreover, when the graph editing problem is restricted to bi- 
partite graphs in which edge additions and deletions are limited to edges between partite 
sets, it corresponds to the problem of determining the distance between the sets of arbitrary 
{0, l}-matrices. 

We define the distance Dist'(n, Forb'(if)) to be analogous to Dist(n, Forb(ii)), but in this 
case, only permit edge-deletions. This quantity will always be equal to Dist(i^„, Forb'(if)), 
i.e., it is the minimum number of edges in the complement of an H-free graph. If ex{n,H) 
is the maximum number of edges in a graph on n vertices with no subgraph isomorphic to 
H, then 



The asymptotic behavior of ex(n, H) is provided by the following theorem, which was gen- 
eralized by Erdos and Simonovits [T^ . 



Dist(G, ii) 



mm{\E{G)AE{H')\ : H' = H}. 




(1) 



Theorem 1.1 (Erdos, Stone [^) ex(n, if ) = 1 
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In particular, the distance Dist(n, Forb'(-ff)) is asymptotically determined by the chromatic 
number of a graph H . 

Clearly, when a forbidden graph is complete or empty, finding the editing distance be- 
comes a trivial task immediately reduced to Turan's theorem [22] ■ On the other hand, per- 
haps the most interesting case is when the forbidden induced subgraph is self-complementary, 
i.e., when both operations of edge-deletions and edge-additions carry "an equal power". In 
this case, we derive asymptotically tight estimates for Dist(n, Forb(i/)). We also give gen- 
eral bounds for other graphs. Our main tool in providing the lower bounds is Szemeredi's 
Regularity Lemma which allows us to express the bounds in terms of an invariant which 
we designate the binary chromatic number. In defining it, we use the term coclique in 
place of the term independent set. 

Definition 1.2 The binary chromatic number of a graph G, Xb{G) is the least integer 
k + 1 such that, for all c G {0, . . . , + 1}, there exists a partition of V{G) into c cliques and 
k + 1 — c cocliques. 

This invariant was first introduced by Promel and Steger (called r in to express, 
asymptotically, the number of n-vertex graphs which fail to have an induced copy of some 
small, fixed graph H. In E] , t was generalized as the so-called colouring number of a 
hereditary property, V. In particular, when V = Forb(if), the colouring number of V is 
exactly xb{H) — 1. It should be noted that xb is not the cochromatic number (see [TH] ) 
even though the definitions may seem to be similar at first glance. We use the term "binary 
chromatic number" in order to emphasize the close connection to the chromatic number and 
the complementary role that cliques and cocliques play. 

The following comprise the main results of this paper. 

Theorem 1.3 If H is a graph with binary chromatic number k + 1, then 

Dist(n, Forb(iJ)) > (1 - o(l)) — . 



If /c = Xb{G) — 1, then let Cmin be the least c so that G cannot be partitioned into c cliques 
and k — c cocliques. Let Cmax be the greatest such number. We now have an upper bound 
that can be expressed in terms of the binary chromatic number of H and corresponding Cmin 

and Cmax- 

Theorem 1.4 Let H be a graph with binary chromatic number k + 1 and Cmin one? Cmax be 
defined as above. If Cmm < k/2 < Cma.x, then 

Dist(n,Forb(i7))<;^f!'Y (2) 



2k \2 

Otherwise, let Cq be the one of {c^a.^, Cmm} that is closest to k/2. Then 

Dist(n, FoTh{H)) < I , ^ I 7 fol < 7 fo ) • (3) 



k\2 - k\2 
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Corollary 1.5 If H is a self- complementary graph with the property that xb{H) = k + 1, 
then 



Dist(n,Forb(i/)) = (1 + o(l))— . 



In Section |21 we give preliminary definitions and results that aid in the proof of the 
theorems. Section El contains proofs of the main theorems. We investigate the properties of 
the binary chromatic number in Section 0J Finally, Section El gives several exact results. 

2 Definitions and preliminary results 

We denote by Kn, En, C„, and P„ a complete graph, an empty graph, a cycle, and a path on 
n vertices, respectively. We also define to be a complete p-partite graph with each partite 
set of cardinahty q. We use G to denote the complement of G. For the other definitions, we 
refer the reader to 



Definition 2.1 For a graph G = {V,E) and two disjoint subsets A and B of vertices, the 
density of a pair [A, B) is denoted d{A, B) and is given by the formula 

d(A B^-^S^ 

where e{A, B) is the number of edges of G with one end-point in A and another in B. 

Definition 2.2 For a graph G = (V, E) and two disjoint subsets A and B of vertices, a pair 
{A, B) is e-regular if 

X C A,Y C B,\X\> e|A|, |r| > e\B\ 

imply 

\d{X,Y) - d{A,B)\ < e; 

otherwise, {A,B) is e-irregular. 

The proof of Theorem 12. 71 makes use of the Regularity Lemma (see jT7] and [T^). 

Lemma 2.3 (Regularity Lemma |i22| ) For every positive e and positive integer m, there 
are positive integers M = M(e, m) and N = N{e, m) with the following property: For every 
graph G with at least N vertices there is a partition of the vertex set into i -\- 1 classes 
(clusters) 

F = \/o U U 1^2 U ■ ■ ■ U 

such that 

1. m<i<M, 
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2. |V^i| = |V^2| = --- = |^^|, 

3. |Vo| < en, 

^. at most ei"^ of the pairs {Vi, Vj) are e-irregular. 
We give a name to the partition given by Lemma f2. 31 

Definition 2.4 An (m, e, £)-eqmpartition of a vertex set V is a partition V = 
VqUViU ■ ■ ■ UVi such that the Regularity Lemma 's conditions (Op, ^ and (0j are satis- 
fied (with M = M{e,m) as defined by the Lemma). 

In order to state our lower bound, we need to generalize the idea of an e-regular pair. 

Definition 2.5 An e-regular r-tuple is an r-partite graph with partite sets Vi, . . . ,Vr such 
that \Vi\ = \Vj\ and (Vi, Vj) is an e-regular pair for all i,j, with 1 < i < j < r . 

We say that an e-regular r-tuple is of size rL if \Vi \ = ■ ■ ■ = \Vr\ = L. For < 6 < 1/2, 
an e-regular r-tuple has 5-bounded density if d{Vi, Vj) G [6, 1 — 6) whenever I < i < j < r . 

For convenience, we define an (e, r, L, (5)-configuration to be an e-regular r-tuple of size 
rL that has 6 -bounded density. 

The following Theorem is our major tool in proving the main result. We prove it in 
Sectional 

Theorem 2.6 Let r be a positive integer and 6 and e be real numbers, < 5 < 1 and 
e > 0, such that e < 6/{16r — 16). There is a graph G on n vertices and a constant 
M{e) such that if the number of edge- deletions and edge- additions performed on G is less 
than 4(^:xj(l — 35) (1 — e)^ then the resulting graph contains an (e, r, L, 6) -configuration with 
L>n{l-e)/M{e). 

The following theorem is essentially Lemma 3.5 in We shall use it to prove Theorem 

Ol 

Theorem 2.7 (Promel, Steger [19]) Let H be a fixed graph with binary chromatic num- 
ber r and S be a real number with < 6 < 1/2. There exists an eo = eQ{H,6,r) > such 
that for all e, where < e < eo, there exists an riQ = nQ{H,6,e,r) such that every graph 
G = [V, E) on n > hq vertices has the following property: Let V = VqUViU ■ ■ ■ (JVi be 
an {r,e,i)-equipartition with \Vi\ = ■■■ = = L for L > n/M{e,r) where M(e, r) is 
the constant given in Lemma \2. 3\ for e and r, such that [Vi, . . . ,Vr) forms an {e,r,L,6)- 
configuration. Then, the subgraph induced by IJi=i ^ contains the graph H as an induced 
subgraph. 

For functions f = fiji) and g = gin), let f = u{g) be the usual asymptotic notation denoting 
that g/f-^Oasn^ oo. 
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Lemma 2.8 For a positive real number e and positive integer m, let i have the property that 
m < £ < M{e,m), where M{e,m) is the constant given in the Regularity Lemma (Lemma 
\^.!^) . Let f = fin) = uj{n~^/'^). Then, forn large enough, there is a graph G on n vertices so 
that for any (m, e, i)-equipartition, all pairs of clusters (Vi, Vj), I < i < j < i, are e-regular 
with density in the interval (1/2 — /, 1/2 + /). 

The proof of this Lemma is a routine calculation that we include in the Appendix, for 
completeness. 



3 The proofs 

In order to prove Theorem 11.31 we prove Theorem 12.61 which basically asserts that a graph 
described in Lemma 12.81 requires many editing operations to eliminate all induced copies of 
H. 



3.1 Proof of Theorem IZBl 

Fix S such that < 5 < 1/2. Let G be a graph on n vertices as described in Lemma 12.81 
with e, r and L as given in that Lemma. Let G' be a graph with no (e, r, L, 5)-configuration 
having least distance from G. 

Apply the Regularity Lemma to G' with parameters e and m = e^^, to get i + 1 clusters 
(we have £ > e~^) Vq, Vi, . . . ,Ve such that |Vi| = ■ ■ ■ = \Ve\ = L. Furthermore, all but 
pairs (Vi, Vj), I < i < j < £ are e-regular. Recalling Definition 12. 5| we say that an e-regular 
pair is (5-bounded if its density is at least S and at most 1 — 6; otherwise, it is 5-unbounded. 

Since G' does not have an (e, r, L, 5)-configuration, it is not possible to have a set of 
r clusters such that between any two clusters, there is a ^-bounded e-regular pair. Thus, 
according to Turan's theorem, the number of pairs of clusters, (Vi, Vj), that induce either a 
5-unbounded e-regular pair or an e-irregular pair is at least 

w — 1 = 7 ^ — • 

^ ' 2 2(r-l) 

The number of e-irregular pairs, (Vi,Vj), is at most e£^, thus the number of (5-unbounded 
e-regular pairs is at least 

1 



2(r-l) J 2 

Because G came from Lemma (2.81 if some pair [Vi, Vj) were 5-unbounded in G', then at 
least (1/2 — 5 — o(l))L^ edges had to be either added or deleted between Vi and Vj in order 
to get G' from G. Hence, the total number of edges that had to be changed is at least 

/I 1 \ /I \ £^r? 



L' -e-—\[--6- o(l) > ; (I - 35). 

^ 2(r-l) 2£) \2 ) ~ 4(r-l)^ ' 



6 



The inequality is valid as long as 8(r — l)(e + ^"^/2) < S. 

Since iL > n{l — e), the total number of edges that have to be altered in order to obtain 
G' from G is at least 



3.2 Proof of Theorem fCT 

Choose a 6 arbitrarily small, and let G be the graph guaranteed by Theorem l2.6l If fewer than 
^(1 — 3(5) (1 — e)^ edge-operations are performed on G to obtain a graph G', then there are 
disjoint vertex sets Vi, . . . , V^+i in G' that satisfy the conditions of Theorem 12. 71 Theorem l2.7l 
then implies that G' contains an induced H. Thus the editing distance Dist(G', Forb(/7)) is 
at least fj(l-35)(l-e)2. ■ 



3.3 Proof of Theorem fOl 

Lemma (3.11 emphasizes the importance of c in the definition of xb- 

Lemma 3.1 Let H be a graph with binary chromatic number k + 1 and c be an integer, 
< c < k, so that H cannot be covered by exactly c cliques and k — c independent sets. Let 
G be a graph with density d = e{G)/(^^y As long as it is not the case that both d = and 
c = k, or both d = 1 and c = 0, 

Di.t(G.Fo,bW)< ^^^;;'^_-;^'^_^, Q; (4) 

otherwise, Dist(G', Forb(i/)) < ^Q) ■ 

Proof. In order to prove the statement of the Lemma, we provide a probabilistic algorithm 
adding and deleting some edges of G such that the resulting graph has no induced copy of H. 
We begin by assigning colors independently to the vertices of G: 1, . . . , c each with probability 
p and c+1, . . . ,k each with probability q. Call such a coloring g. If g{x) = g{y) G {1, . . . , c} 
and xy ^ E{G), then add an edge xy to E{G). If g{x) = g{y) G {c + l,...,k}, and 
xy G E{G), then delete xy from E{G). As a result, we obtain a graph G' with the vertex set 
partitioned into k subsets. The first c of these subsets induce cliques and the others induce 
cocliques. Since the vertices of H can not be partitioned into c cliques and k — c cocliques, 
H is not an induced subgraph of G'. 
The expected number of changes is 

fip,q)=((''^-e{G)]cp' + e{G)ik-c)q'={il-d)cp' + dik-c)q') 
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We also have the restriction 

cp + {k — c)q = 1. (5) 

As long as we do not have the case that both d = and c = or the case that both d = 1 
and c = 0, the method of Lagrange multipliers gives that the minimum of f{p, q) restricted 
to © occurs when p = d/ {dc + {1 — d){k — c)) and g = (1 — d)/{dc + {1 — d){k — c)) and is 
equal to 

d{l — d) fn' 



dc+{l-d){k-c) V2, 
Since this is the expected number of changes, there exists a partition of the vertices of G 

d{l-d) 
dc+{l-d){k-c) 



such that the above procedure requires at most ^ (2) changes to make the graph 



i7-free. 

If both d = and c = k, then perform the above procedure, but fix p = 1. If both d = 1 
and c = 0, then perform the above procedure, but fix g = 1. In both cases, the expected 
number of changes to be performed is ^ (2) ■ ^ 

In order to prove inequality © of Theorem 11.41 we use Lemma 13.11 and find conditions 
when 

'Al^ < J_. (6) 

dc+{l- d){k ~c) - 2k ^ ^ 

If c < k/2, then (0) holds when d G [0, 1/2] U [1 - c/k, 1]. If c > k/2, then (0) holds 
when d G [0,1 — c/k] U [1/2,1]. Consider a graph G of density d and an H for which 
Cmin < fc/2 < Cmax- If d < 1/2, then choose Cmin; otherwise, choose Cmax- As a result, 
Dist(G',Forb(iJ))<^(^). 

In order to prove inequality (jH)) of Theorem 11.41 we need to maximize expression (j3)) over 
d. The maximum value occurs when d = ^ \/c(k _c)^ 

k—2c 

k - 2^c{k - c) (n\ _ I 1 \ 1 (n 

The expression in parentheses is at most 1. 



3.4 Proof of Corollary 11.51 

Let if be a self-complementary graph with Xb{H) = k + 1 such that Cmin and Cmax are 
defined as in preparation for Theorem 11.41 That is, Cmin is the least c so that G cannot be 
partitioned into c cliques and k — c cocliques. The quantity Cmax is the greatest such c. 

Because H = H, H can be partitioned into c cliques and k — c cocliques if and only if H 
can be partitioned into k — c cliques and c cocliques. Hence, Cmax = k — Cmin and it must be 
the case that Cmin < k/2 < Cmax- Now, from Theorem ll.3l and the first inequality of Theorem 
11.41 the result follows. ■ 
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4 Binary chromatic number 

It is easy to see the following. 
Fact 4.1 Let G be a graph. 

1- Xb{G)>x{G),x(G) 

2. Xb{G)=xb(G). 

Recall that K'^ is the complete p-partite graph with q vertices in each part. 
Proposition 4.2 Let G he a graph. 

Xb{G)<x{G) + x{G)-1. 

This hound is tight for G = . 

Proof. Consider c cliques spanning a set A of vertices in G. If c < we are done since 
xiG — A) < x{G). Otherwise, c = x{G) and it is possible to partition all vertices into c 
cliques. We can obtain required cocliques by considering single vertices. 

We see that xb{K^) > p + q — lhj observing that if we require q — 1 cliques in a partition 
of a vertex set of into cliques and cocliques then p — 1 cocliques is not enough to partition 
the rest of the vertices. ■ 

Next we determine the binary chromatic number of some classes of graphs to partition 
the rest of the vertices. 

Proposition 4.3 Let Xb{G) denote the hinary chromatic numher of a graph G. 

1. Ifn>5, thenxBiGn) = \n/2]. 

2. Ifn> 3, thenxB{Pn) = \n/2]. 

3. XB{Kl)=p + q-l. 

Proof. 

1. The lower bound follows from Fact I4.1I|T|) . For the upper bound, we can construct the 
partition of a vertex set in at most [|] cliques and cocliques as follows. If we need only 
cliques, or only cocliques, it is clear. When we need at least one clique and at least 
one coclique in that partition, take the largest coclique on \_^\ vertices. The leftover 
graph consists of independent vertices and, if n is odd, of one edge. Take this edge (or 
a single vertex when n is even) as a clique of our partition. The number of leftover 
vertices is [|] — 2 and we are done. 
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2. This is quite similar to the case of C^. We leave it to the reader. 

3. This follows from Proposition 14. 2[ since xi^f) = P ^"^^ xi^p) = Q- 



Proposition 14.41 gives the bounds on the smallest binary chromatic number among all 
n- vert ex graphs. 

Proposition 4.4 If n is a positive integer, then 

v^< min xbIG) < v^+(l + o(l))n°-2^25_ 

\V(G)\=n 

Moreover there are infinitely many graphs for which the lower bound is attained. 

Proof. For the lower bound, we use Fact I4.1f 1) and the fact that xiG)x{G) > n. As a 
result, one of x{G), x{G) is larger than ^/n. 

The lower bound is, in fact, attained by an infinite class of graphs on n = /c^ vertices 
where is a prime. To realize this lower bound, consider the following construction of a 
graph G = Gn = Gk2. 

Let V{Gn) be pairs of integers (i, j), for i, j = 1, . . . , k. We create k + 1 distinct partitions 
of V{G) into sets of cardinalities k. Let the i*^ partition Pi = {VI, V2, . . . , V^} be defined 
as follows for i = 0, . . . , A;: 1// = {(j, 1), (j + i, 2), (j + 2i, 3), . . . , (j + {k - l)i, k)}. Here, 
addition is taken modulo k. Let VJ induce a chque if i < j and let VJ induce a coclique if 
i > j. Next we verify that G is well defined. 

Note that for each pair of vertices x,y & ^iG), x,y E V/ for some i,j. Moreover, if 
x,y E Vj then at most one vertex x or y is in VJ, where i 7^ i'. Indeed, if x,y G and 

X = (xi, X2) then y = {xi + li, X2 + I). If x,y E V^, then y = {xi + I'i', X2 + I')- Now, since 
X2 + 1 = X2 + 1' we have / = /' (mod k). Thus Xi + li = Xi + I'i = Xi + I'i' , therefore Vi = I'i' 
and i = i' if k is prime. 

We see that Pi provides a vertex-partition of G into i cocliques and k—i cliques, Q <i <k. 
Therefore Xb{G) < k = ^/n. 

For arbitrary n, we find the upper bound by taking the smallest k > ^/n such that /c is a 
prime. Consider (7^2 as defined above and let (?„ be a subgraph of (7^2 induced by a set of n 
vertices. As we have shown, XsiGk^) < k, which implies XsiGn) k. In a paper of Baker, 
Hartman and Pintz 0, for x at least some Xq, there is a prime in the interval [ 
Thus, xb{G) < A; < + (1 + o{l))n^-^^^^ . ■ 



5 Better bounds for small graphs 

The results stated in Section^ are asymptotic. However, for some graphs H we are able to 
determine the exact value of Dist(n, Forb (//)). 
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Here we shall use the fact that the extremal graphs for forbidden induced subgraphs on 
three vertices as well as for induced subgraphs on 4 vertices and 3 edges are known precisely 

m 



Theorem 5.1 If H e {i^s, iTg, i^i^a, i^i,2}, then Dist(n, Forb(i/)) = ('"f ') + (^yJ). 

Proof. The cases of the triangle and of the empty graph follow immediately from 
equation ((T}. 

Now, we consider the editing distance for iri.2-free graphs. Note that the graph which 
contains no induced Ki^2 is a disjoint union of cliques. 

Let G be an arbitrary graph on n vertices. If G has minimum degree at least \n/2] then 
we add all missing edges to obtain a complete graph. In this case, at most (2) — \n/2]^ < 
+ ('-"'2^-') edges were added. Otherwise, delete all edges incident to a vertex v of degree 
at most [n/2j and apply induction to G \ f . The total number of additions and deletions is 



\{n~l)/2 



) + ( 



L(n-i)/2j^ 



^ < ^l"/2|j _|_ (L"/2Jj^ This provides an upper bound on 



at most [n/2\ + ( 
/• 

For the lower bound, we consider a complete bipartite graph H onn vertices with almost 
equal parts A, B. Let G be the disjoint union of cliques Si, S2, ■ ■ ■ , Sk on the same vertex 
set as H. Let = 1^4 fl and bi = \B n V{Si)\, for i = 1, . . . , k. It is clear that the 

number of editing operations performed on H to obtain G is 



E 



This function is minimized when = bi for all i, except perhaps one i G {1, . . . ,k} such 
that \ai-bi\ = 1. Now, s > n^/A-n/2 for even n and s > (n - 1)^/4 - (n - l)/2 + - l)/2 
for odd n, and the result follows. ■ 



Let Q be the set of graphs on n vertices with no induced subgraphs on 4 vertices and 
3 edges. In jH] it was shown that any graph in Q or its complement is a disjoint union of 
4-cycles and trees on at most 3 vertices. Note that G E Q if and only if G G Q. 

Theorem 5.2 [(n^ - 5n)/4j < Dist(ra, Q) < (n^ - n)/4. 

Proof. Let G be a graph on n vertices. Since En, Kn G Q, it is sufficient either to add 
all missing edges or to delete all edges to obtain a graph from Q. Thus, the upper bound 
follows. 

For the lower bound consider a graph G with [(n^ — n)/4j edges. Assume first that the 
minimum number of edit operations results in a graph whose components are either 4-cycles 
or trees on at most 3 vertices. As a result, the total number of edges within these components 
is at most n. Therefore, at least |i?(G)| — n edges of G had to be deleted. 

The result is similar if the minimum number of edit operations results in a graph such 
that its complement has components that are either 4-cycles or trees on at most 3 vertices. 
So, at least |-E(G)| ~ ''^ = (2) ~ 1-^(^)1 ~ ''^ edges had to be added to G. 
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As a result, the number of edit operations is at least [(n^ — n)/4j — n. 



Conclusions 

The editing problem of graphs we consider in this paper can be reformulated in terms 
of complete edge-colored graphs, where the edges of the graphs correspond to edges of one 
color, say red, and the edges of the complement correspond to edges of another color, say 
blue. Our editing operations are equivalent to changing the color of some edges from red to 
blue or from blue to red. 

It is natural to consider more than two colors. Specifically for any two colorings of E{Kn) 
in colors from {1, . . . , 7}, we define the distance to be the smallest number of edge-recolorings 
to obtain one coloring from the other. Our results for classes of graphs with forbidden in- 
duced subgraphs can be generalized for classes of multicolored graphs with forbidden color 
patterns. When considered on bipartite graphs, the multicolored graph editing problem is 
equivalent to the problem of editing a matrix so that fixed patterns on submatrices do not 
occur [T]. 
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6 Appendix: Random graph 

Let G{n,p) be a graph in which each edge from is chosen independently with probabihty 
p (see 01 El)- Lemma [2.81 follows immediately from the following: 

Lemma 6.1 Fix a constant e > and positive integer m. Let I have the property that 
m < i < M{e,m) where M{e,m) is the constant given in the Regularity Lemma (Lemma 
\2.'J\) . Let G = G{n, 1/2), f{n) = uj{n~^^'^) and P he the probability that for each {m,e,i)- 
equipartition of the vertices of G, all pairs of clusters {Vi,Vj), 1 < i < j < i, have density 
in the interval (1/2 — f{n), 1/2 + f{n)). Then P approaches 1 as n goes to infinity. 

Proof. We just want to compute the probability that all pairs of disjoint sets, each of 
size at least e'n (where e' = ^g^^pj), have density in the interval (1/2 — /, 1/2 + /), for any 
/ = /(n)=a;(n-V2). 



Pr < 



V (rf(^,r)^(i/2-/,i/2 + /)) > 



S,TC V{G) 

snT = 

\S\AT\>e'n 

< 2"2"2 Pr {d{S, T) < 1/2 - /} 

< 2-4"exp(-2(/|5||T|)V(|5||T|)) 

< 2-4"exp(-2/2|5||T|) 

< 2-4"exp (-2(e')V^^^) ^0 



(7) 



Chernoff's bound (see ^5j) is used to achieve inequality ((Tj). 



14 



